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Abstract To address the problem of unsupervised outlier detection in wireless sen- 
sor networks, we develop an approach that (1) is flexible with respect to the outlier 
definition, (2) computes the result in-network to reduce both bandwidth and energy 
usage, (3) only uses single hop communication thus permitting very simple node failure 
detection and message reliability assurance mechanisms (e.g., carrier-sense), and (4) 
seamlessly accommodates dynamic updates to data. We examine performance using 
simulation with real sensor data streams. Our results demonstrate that our approach 
is accurate and imposes a reasonable communication load and level of power consump- 
tion. 

Keywords Outlier detection • Wireless sensor networks 



1 Introduction 

Outlier detection, an essential step preceding most any data analysis routine, is used 
either to suppress or amplify outliers. The first usage (also known as data cleansing) 
improves robustness of data analysis. The second usage helps in searching for rare 
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patterns in such domains as fraud analysis, intrusion detection, and web purchase 
analysis (among others). 

Several factors make wireless sensor networks (WSNs) especially prone to outliers. 
First, they collect their data from the real world using imperfect sensing devices. Sec- 
ond, they are battery powered and thus their performance tends to deteriorate as power 
dwindles. Third, since these networks may include a large number of sensors, the chance 
of error accumulates. Finally, in their usage for security and military purposes, sensors 
are especially prone to manipulation by adversaries. Hence, it is clear that outlier de- 
tection should be an inseparable part of any data processing routine that takes place 
in WSNs. 

Simply put, outliers are events with extremely small probabilities of occurrence. 
Since the actual generating distribution of the data is usually unknown, direct compu- 
tation of probabilities is difficult. Hence, outlier detection methods are, by and large, 
heuristics. Because the problem is fundamental, a huge variety of outlier detection 
methods have been developed. In this paper we focus on non-parametric, unsupervised 
methods. A simplistic implementation of these methods would require centralization of 
the data. Such centralization is hard and costly in WSNs as it demands high bandwidth 
and requires reliable message transmission over multiple hops, which is both costly and 
difficult to implement. 

We developed a technique for the computation of outliers in WSNs. This technique 
(1) is flexible with respect to the outlier definition, (2) computes the result in-network to 
reduce both bandwidth and energy usage [27] , (3) only uses single hop communication 
thus permitting very simple node failure detection and message reliability assurance 
mechanisms (e.g., carrier-sense), and (4) seamlessly accommodate dynamic updates to 
data. In addition to these essential features, the algorithm presented here also has two 
highly desirable properties: it is generic - suitable for many outliers detection heuristics 
and its communication load is proportional to the outcome {i.e. the number of outliers 
reported). 

We exemplify the benefits of our algorithm by implementing it using two different 
outlier detection heuristics and simulating 53 sensors using the SENSE sensor net- 
work simulator [18] with real sensor data streams. Our results show that the algorithm 
converges to an accurate result with reasonable communication load and power con- 
sumption. In most tested cases, our algorithm's performance bests that of a centralized 
approach. 

2 Motivating Application 

The potential importance of efficient outlier detection in wireless sensor networks is 
best understood in the context of popular applications of those systems. Consider, for 
instance, the acoustic source localization problem. In this problem, a set of synchronized 
sensors all register the arrival of a specific sound at a certain time. Given the distance 
of two sensors from one another and the time difference of arrival (TDOA) of the 
sound, the potential locations of the source vis-a-vis the two sensors can be deduced. 
Given data from several sensors, the possible relative locations (each a hyperbola in 
the plane) can be intersected, and the location of the source can be pinpointed (see, 
for example [3,51] and Fig. [T|. 

While the theoretic framework of TDOA based source location is simple and clear, 
the problem becomes much more complex in reality. Firstly, the real terrain in which 
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Fig. 1 The expansion of a sound over time and the possible source location as computed by 
two different pairs of sensors according to the time diflference of arrival. The origin of the sound 
lays in the intersection of the two hyperbola. 



the problem occurs is rarely flat, or an unobstructed three-dimensional space. Secondly, 
echos and multiple concurrent sounds may add many possible hyperbolas from which 
the relevant ones need be selected. Last, and perhaps most importantly, the method is 
sensitive to erroneous initializations in terms of sensor synchronization and positioning, 
as well as to the possible degradation of these factors when sensors' power dwindles. 
All these factors amount to a multiplicity of possible hyperbolas, only few of which 
intersect at the correct location of the source. 

In fact the similar principal of localization applies in a broader setting, in which 
imprecise detection by a single sensor, regardless of its modality (e.g., acoustic, seismic, 
visual, electromagentic, etc.) is applied in so called binary sensing object location [54] 
in which a group of neighboring nodes cooperate to narrow the object location. In 
fact, a detection, true of false, of object presence in a sensing range of sensor will 
trigger a tracking algorithm or even the entire tracking service [19]. Hence, to avoid the 
costs associated with unnecessary execution of the tracking algorithm or service, wrong 
data (whatever the cause) must be detected and removed. There are ample methods 
in which such detection and removal can be carried out {e.g. Maximum Likelihood 
[50]). However, they all rely on centralization of all of the data for processing. Such 
centralization would likely be unacceptably costly in wireless sensor networks for two 
main reasons. First, because a huge portion of the energy of a sensor would be spent on 
relaying data of other sensors. Second, because naive centralization would make no use 
of old data when new data arrives, even if the data changes only slightly. For instance, 
if a certain sensor produces unwanted signals (say, due to a local noise source), that 
sensor and every sensor relaying its data to the center would constantly waste energy 
on centralization of the data even though it might clearly be undesired. 
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It is therefore of high importance to be able to perform data cleansing in the network 
concurrent to any decision protocol. With the method suggested in this paper, sensors 
can constantly and efficiently prune away data which seems false. Only then, and only 
if the remaining data seems to require further analysis, would the more complex and 
costly procedure for source localization be executed. In this way, much energy can be 
saved and system lifetime can be extended. 

This paper presents an efficient algorithm for in-network outlier detection. The 
algorithm is generic, permitting several definitions of an outlier. The experimentation 
and simulation results are presented for this aglorithm and not for the entire motivating 
example because source localization and tracking using wireless sensor networks is well 
understood (e.g., see [10]). 

3 Related work 

3.1 Outlier detection 

Outlier detection is a long studied problem in data analysis, we provide only a brief 
sampling of the field. Hodge and Austin [28] present a survey focusing on outlier de- 
tection methodologies based on machine learning and data mining. These include dis- 
tance and density-based unsupervised methods, feed-forward neural networks and deci- 
sion tree-based supervised methods, and auto-associative neural network and Hopfield 
network-based methods. Barnett and Lewis [7] provide a survey of outlier detection 
methodologies in the statistics community. 

Our algorithm is flexible in that it accommodates a whole class of unsupervised 
outlier detection techniques such as (1) distance to fc*'' nearest neighbor [46], [9], (2) 
average distance to the k nearest neighbors [6], [9], and (3) the inverse of the number 
of neighbors within a distance a [35] (see Section |4] for details). 

3.2 Wireless sensor networks 

WSNs combine the capability to sense, compute, and coordinate their activities with 
the ability to communicate results to the outside world. They are revolutionizing data 
collection in all kinds of environments. At the same time, the design and deployment 
of these networks creates unique research and engineering challenges due to their ex- 
pected large size (up to thousands of sensor nodes), their often random and hazardous 
deployment, obstacles to their communication, their limited power supply, and their 
high failure rate. 

The software for WSNs needs to be aware of their limitations and features. The 
most important among these are limited power, high communication cost, and limited 
direct communication range. In [26], Estrin et al. introduce scalable coordination as 
an important component of the needed software. A survey of the state-of-the-art in 
WSNs is given in in [5]. Another survey [4] focuses on challenges arising from specific 
applications such as military, health care, ecology, and security. 

Energy-efficiency, a cardinal WSN requirement, is often achieved by minimizing 
communication using topology-control algorithms that dictate the active/sleep cycles 
of sensor nodes. Examples include Geographic Adaptive Fidelity (GAF) [59], ASCENT 
[17], STEM [47], and ESCORT [14]. While the focus of this paper is on WSN outlier 



5 



detection, the challenge is the same as in the above mentioned works. Hence, while 
we do not propose a topology-control algorithm, we aim to design an energy-efficient 
algorithm by minimizing the required communication overhead. 

Other research efforts have also addressed the issue of developing a framework for 
distributed outlier detection in WSNs. 

The framework of Zhuang et al. [61] use a weighted moving average approach to 
smooth noise from the data stream arriving at each sensor. In addition to temporal 
information (past data values), sensors also use data from neighboring sensors (spatial 
smoothing) to reduce the rate at which data values are propagated to the sink. When 
an observed data value remains within the established spatio-temporal trend, it is not 
propagated. Their approach differs from ours in that theirs does not seek to detect 
outliers. 

The framework of Sheng et al. [49] allows the discovery of k-nearest-neighbor based 
outliers: points whose distance to their k-nn exceeds a fixed threshold or the top n points 
with respect to the distance to their k-nns. Each sensor maintain a histogram-type sum- 
mary of pertinent information over a sliding window of its data points. This summary 
is propagate to a sink node. The sink node collects the summaries and queries the net- 
work for any additional information needed to correctly determine the outliers over the 
whole network. The use of summaries allows their approach to use less communication 
than a naive, centralized approach. Their approach differs from ours in several ways. 
First, they only detect outliers over one dimensional data. Indeed, extending their ap- 
proach to more dimensions is complicated by the fact that compact, multi-dimensional 
histograms are difficult to build. Second, they only consider the two k-nn based outlier 
definitions described above. While our approach encompasses these and more. Thirdly, 
their approach only applies in settings where spatial proximity is unimportant (data 
from all sensors, near and far, is used in determining outliers). We have developed an 
approach that considers spatial proximity ("semi- local" outlier detection) as well as 
one that does not. 

The framework of Subramaniam et al. [53] requires the sensors to maintain a tree 
communication topology and computes outliers based on an estimate of the underlying 
probability distribution from which the data arises. Such an estimate is computing 
by each sensor maintaining a random sample of its data observations. Our approach 
differs in at least four ways. First, ours does not make any assumptions about the 
communication topology {e.g. it is a tree), save that it is connected. Sccinid. ours 
computes outliers with respect to all of the data observations at each sensor, not 
a sample. Third, ours can smoothly take into account spatial proximity among the 
sensors ( "semi-local" outliers) while Subramaniam does not focus on this task. Fourth, 
our approach is designed to smoothly adjust to changes in the underlying network 
topology while Subramaniam's requires that the underlying communication tree be 
reestablished by other means before their algorithm can resume operation. 

The framework of Janakiram et al. [30] is based on a Bayesian Belief Network (BBN) 
that has been constructed over the WSN (and distributed to each sensor). Using this, 
each sensor can estimate the likelihood of an observed tuple and, therefore, detect 
outliers. However, Janakiram does not discuss the problem of updating the BBN given 
network/data change. It is not clear to what extent the BBN construction phase can 
by carried out in-network. Our approach differs in that it is in-network and designed 
to smoothly adjust to changes in data/network. 

The framework of Zhuang and Chen [60] uses a wavelet based technique for cor- 
recting large isolated spikes from single sensor data streams. A dynamic time warping 
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(DTW) distance-based technique is also used to identify more steady intervals of erro- 
neous sensor data by comparing the data streams of spatially close sensors assumed to 
produce similar data streams. To reduce energy consumption, anomalous data streams 
are not transmitted to the base station. Our method is similar in that it is in-network. 
However, Zhuang and Chen's use of DTW is tightly integrated with a minimum hop 
count routing algorithm, wiiich makes the approach more restrictive than ours. 

Rajasegarar et al. [45] describe an approach that is based on distributed non- 
parametric anomaly detection and requires sensors to maintain a tree communication 
network topology. Here each sensor clusters its sampled measurements using a fixed- 
width clustering algorithm, then extracts statistics of the clusters (i.e., the centroid 
and number of contained data vectors) and then sends them its parent node. The par- 
ent uses its children's' cluster statistics to form a merged cluster and then transmits 
that cluster to its parent. This process continues recursively until the base station re- 
ceives all clusters, after which it will perform anomaly detection to identify all outliers. 
While this approach supports energy-efficiency by distributing the clustering operation 
throughout the network, anomaly detection is only performed at the base station. Our 
approach differs in that it distributes the anomaly detection process itself throughout 
the network, quickly enabling nodes to identify outliers and autonomously make further 
data processing decisions. Also, our approach does not rely on the use and maintenance 
of a routing tree and hence, is able to smoothly adjusts to changes in the underlying 
network topology. 

Adam et al. [1] address the issue of accounting for spatially neighboring peers 
when detecting outliers in sensor networks. However, they assume the sensor datasets 
are centralized and the outlier processing is carried out there. They do not consider 
the problem of carrying out the outlier detection in-network as we do. 

Palpanas et al. [42] propose a technique for distributed deviation detection using a 
network hierarchy of low and high capacity sensors that are differentiated with respect 
to processing power and communication range. Here, low capacity sensors aim to detect 
local outliers while high capacity sensors detect more spatially dispersed outliers using 
an aggregation of low capacity sensors' data. Kernel density estimators are used to 
model the distribution of data values reported by sensors and distance-based detection 
techniques are used for identifying outliers. The authors present no formal evaluation 
of the proposed technique. Our approach differs in that it does not rely on a hierarchy 
of device capabilities. 

The framework of Radivojac et al. [44] addresses the process of sensors learning 
data distributions from class-imbalanced data. Here, sensors send data points to a 
central base station which is tasked with generating a classification model from class- 
imbalanced data (i.e., having an abundant number of negative samples and a small 
amount of positives). The model is generated using a neural network classifier, after 
which the base station distributes the model to the sensors for detection purposes. 
This process repeats throughout the lifetime of the network. A Bayesian classifier is also 
employed to extend the lifetime of the network by minimizing the total cost of detection 
and classification (e.g., costs of transmitting false-positives and false- negatives). Again, 
our framework differs in that it operates in-network as opposed to a centralized manner. 

Our work in this paper is an extension of our preliminary work appearing in confer- 
ence proceedings [15]. We have extended our preliminary work by providing complete 
correctness proofs for the global outlier detection algorithm. And, we have improved 
the experimental analysis of the global algorithm. We have also added the localized 
outlier detection algorithm and experimental analysis of it. 
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3.3 Distributed data mining 

Distributed Data Mining (DDM) has recently emerged as an important area of re- 
search. DDM is concerned with analysis of data in distributed environments, while 
paying careful attention to issues related to computation, communication, storage, and 
human- computer interaction. Detailed surveys of Distributed Data Mining algorithms 
and techniques have been presented in [32], [33], [31]. Some of the common data- 
analysis tasks include association rule mining, clustering, classification, kernel density 
estimation and so on. 

Recently, researchers have started to consider data analysis and data mining in 
large-scale dynamic networks with the goal of developing techniques that are highly 
asynchronous, scalable, and robust to network changes. Efficient data analysis algo- 
rithms often rely on efficient primitives, so researchers have developed several differ- 
ent approaches to computing basic operations [e.g. average, sum, max, or random 
sampling) on dynamic networks. Mchyar et al. [39] develop an asynchronous, deter- 
ministic technique for computing an average over a large, dynamic network. Kempe 
et al. [34] and Boyd et al. [13] investigate gossip based randomized algorithms. Jela- 
sity and Eiben [36] develop the "newscast model". Bawa et al. [8] have developed an 
approach in which similar primitives are evaluated to within an error margin. Wolff 
et al. [56] develop a local algorithm for majority voting. Datta and Kargupta [23] 
develop a technique for uniformly sampling data distributed over a large-scale peer- 
to-peer network. Wolff et al. [58], Sharfman et al. [48], and Bhaduri et al. [11] de- 
velop techniques for threshold monitoring over a large, distributed set of data streams. 
Finally, some work has gone into more complex data mining tasks: association rule 
mining [56], facility location [37], decision tree induction [12], classification through 
meta- learning [38] (all four based on local majority voting), genetic algorithms [20], 
k- means clustering [25] [57], web user community formation [22], hidden variable dis- 
tribution estimation in a wireless sensor network [40], outlier detection in distributed 
data streams [41] [52]. The last two papers address a related problem as we do: outlier 
detection over multiple distributed data streams. However, their work is not designed 
for a WSN. For example, they rely on frequent whole-network broadcasts (Otey) or 
information centralization at a leader node (Su) - arguably reasonable approaches in 
a wired network, but very costly in a WSN. Finally, an overview of the problem of 
carrying out data mining on data distributed over a dynamic peer-to-peer network is 
given in [24]. 



4 Preliminaries 

4.1 Outlier Detection Defined 

Let D be a data space. We adopt a commonly used approach in the data mining/machine 
learning literature such that outliers axe defined by specifying ranking function, R. This 
function maps a; G D and finite £) C D to a non-negative real number R{x, D) indicat- 
ing the degree to which x can be regarded as an outlier with respect to a dataset D. 
Some common examples of R include (among others): the distance to the k*^ nearest 
neighbor ( [46], [9]); the average distance to the fe nearest neighbors ( [6], [9]); and LOF 
( [16]). We assume that a fixed total linear order, ^, on D is used as a tic-breaking 
mechanism to ensure that R{.,Q) creates a total linear ordering on D for any finite 
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Q C D. This is equivalent, for our purposes, to assuming, without loss of generality, 
that R{.,Q) is one-to-one. 

R is assumed to satisfy the following two axioms. Given a; £ D, for all finite 
Ql Q2 C D: anti-monotonicty, R{x,Qi) > R{x,Q2); smoothness, if R{x,Qi) > 
R{x,Q2), then there exists z € Q2 \ Qi, such that R{x,Qi) > R{x,Qi U {«}). The 
anti-monotonicty axiom is similar to the Apriori rule in frequent itemset mining [2]. 
The smoothness axiom, intuitively, states that R changes gradually. As more points 
are added to Qi, the rating function changes gradually to R{x,Q2)- Of the examples 
in the previous paragraph, all but LOF satisfies these assumptions, assuming, as we 
do, the use of a tie-breaking mechanism as described in the previous paragraph. 

Given n a user-defined parameter and a finite dataset -D C D, the outliers of D are 
denoted On{D) and are defined to be the top n points in D with respect to R{., D) (if 
\D\ < n, then On{D) is defined to be D). 



4.2 Distributed System Set-up 

The distributed system architecture we assume consists of a collection of sensors, pi, 
each holding a finite dataset Di C D. Sensors communicate by exchanging messages to 
their immediate neighbors as defined by an undirected graph. We assume that messages 
are reliable, i.e. a message sender can assume that if a message is not recieved, then 
the sender will be informed); and each sensor Pi can accurately maintain the list of its 
immediate neighbors, Z^, in the graph. Our algorithms work as long as there exists a 
path, possibly unknown, from each sensor to every other sensor. Note that, message 
reliability is difficult to fully maintain in a WSN - some message dropping is expected. 
While our algorithm assumes no message dropping, modest violation of this assumption 
in our experiments did not effect accuracy significantly. 



5 Global Distributed Outlier Detection Algorithm 

In this section, we describe a distributed algorithm by which sensors, each assumed 
to know R and n, compute On{D) where D — [J^ Di {global outlier detection). In a 
wireless sensor network, it can be desirable for sensors to find outliers only with respect 
to the data contained in nearby sensors, rather than the entire network {semi-global 
outlier detection) . In the next section |S] we describe how to modify the global outlier 
detection algorithm to act in a semi-global manner. 

At any point in time, pi keeps track of the data points it has sent to or received 
from its neighbor pj at some past time. Let _D* j denote the set of points sent from Pi 
to Pj, and, Dj j deonte the set of points sent from pj to pj. Importantly, (DJ j U D^j i) 
denotes the data points that pi can be sure are commonly held with pj (there may 
be more). Let Pi denote _Dj Ujgr i' '-'^ points pi is holding at the current 

time Pi uses to compute an estimate of the overall correct answer, On{D). This 
estiamte, henceforth called p'^s estimate, is On{Pi), the set of outliers based on all the 
information availible to pi at the current time. 

^ Note the distinction between Di and Pi. In words, Di is the set of points that originated at 
sensor pi , while Pi is the set of all points that pi is holding including Di and those originating 
at other sensors but propogated to sensor pi through messaging passing. 
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The algorithm docs not assume any special sensors. Each sensor, pi, asynchronously 
waits for an event to occur: (i) the algorithm is initialized, (ii) Di changes, (iii) a 
message is received from a neighbor, or (iv) a link goes up/down causing p[s immediate 
neighborhood to change (however, algorithm correctness requires that we assume the 
network remain connected). Note that, events for pi are entirely local and can be 
detected without the aid of any other sensors beyond the immediate neighborhood. 
Once pj detects an event, it will decide which of the points it is currently holding {Pi), 
if sent, could cause its neighbor, pj, to change its estimate. Pi then sends these points 
and adds them to Dlj {pi carries out this process separately for all of its neighbors). 

Gradually, the points held by each sensor enlarges until enough overlap is obtained 
so that each sensor's estimate is the correct answer, On{D)- This will be gauranteed 
to occur once each sensor, individually, decides that none of the points it is currently 
holding need be sent to its neighbors. At this point, the algorithm is terminated. To 
see how all of this works, consider an example. 



5.1 Example 

Let R be the distance to the nearest neighbor and given a; £ D and finite PCD, 
N{x, P) denotes the nearest neighbor of x among points in P. Given finite Q C D, 
N{Q,P) denotes [JxeQ -^(^^ n = 1 and consider a network of two sensors, 

Pi and Pj, each initially holding the following one-dimentional datasets. The correct 
answer the algorithm will compute is On{D) = {0.5}. 

- Di = {0.5,3,6, 10,11,..., o}. 

- = {4, 5, 7, 8, 9, a + 1, a + 2, . . . , a + 6}. 

- D = DiU Dj ^ {0.5, 3, 4, 5, 6, 7, 8, 9, 10, 11, . . . , a, a + 1, . . . , a + b}. 

Initially, Pi ^ Di, Pj = Dj, and Dlj = D^^i = D^ j = D j - = 0. For simplicity, we will 
describe the algorithm in synchronous fashion starting with pj. But, the ideas extend 
nicely to asynchronous operation. 

1. Pi computes its estimate as On{Pi) = {6} and then must compute the set of its 

data points that might cause pj to change its estimate if sent. We call these the 
sufficient points from Pi for sensor pj. Formally, we define a set Zj C Pj to be 
sufficient for pj if 

[On{P^) U N(On(A),P^)] U [n{0„{DI^ U D^, U Z^), P,)] C . (1) 

The rationale for the first part is simple. Ora(Pj) is necessary for pj, because if Pi is 
right in its estimate, then pj ought to know about it. Moreover, pi must also send 
N{On{Pi), Pi), because this allows pj to determine if any of its points can cause 
the ranking of the points in On{Pi) to change. 

The rationale for the second part is somewhat more complicated. In brief, if Pi were 
to send Z C Pj to pj, then Pi would also need to send N(On{D\ j U j U Z), Pi). 
To avoid resending, pj requires that N{On{Dl j U Dj i U Z), Pi) is contained in Z. 
To understand the reasoning for all of this, consider that {Dlj U Dj i U Z) is the 
total set of points pj knows pj has if Z wore to send pj. Thus, On{D\ j U D j j U Z) 
is PjS best approximation to p'jS estimate if Z were to send pj. Hence, pi computes 
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its nearest neighbors to these, because, if pi is right in its approximation, it must 
ensure that pj have these neighbors since they could cause Pj to change its estimate. 

Getting back to our example, observe that On{Pi) U N (On{Pi) , Pi) ~ {3, 6}. More- 
over, N{On{Dlj UD;-iU{3,6}),Pi) = N{On{{i,&}),Pi) = {3} C {3,6}. Hence, 
Zj = {3, 6} as this satisfies ([1} above. So, pj sends {3, 6} \ {D\ j U j) — {3, 6} 
and updates Dl j to {3, 6}. 

2. Pj will receive these points and updates Dj ^ to {3,6} (currently D j ^ = 0). So, 

implicitly, Pj now denotes Dj U Dj j = {3, 4, 5, 6, 7, 8, 9, a + 1, a + 2, . . . , a + b}. pj 
computes On{Pj) U N{On{Pj),Pj) ~ {3,4} (assuming appropriate tie-breaking). 
Moreover, it can be seen that this satisfies ([1)). So, Pj sends {3,4} \ {Dl j U j) 
= {3, 4} \ {3, 6} = {4} and updates - to {4}. 

3. Pi will receive these points and updates D j j to {4} (currently D\ j — {3,6}). 
So, implicitly, Pj now denotes Di U j = {0.5, 3, 4, 6, 10, 11, ... , a}. Pi computes 
On[Pi) U N{On[Pi), Pi) = {0.5, 3}. Moreover, it can be seen that this satisfies Q. 
So, sends {0.5, 3} \ {Dlj U D] = {0.5, 3} \ {3, 4, 6} = {0.5} and updates Dl j 
to {0.5, 3, 6}. 

4. Pj will receive these points and updates ^ to {0.5,3,6} (currently Dj ^ = {4}). 

So, implicitly, Pj now denotes DjUDj - = {0.5, 3, 4, 5, 6, 7, 8, 9, a+l,a+2, .... a + b}. 
Pj computes On{Pj) U N{On{Pj), Pj) = {0.5, 3}. Moreover, it can be seen that this 
satisfies (HJ. So, Pj sends {0.5,3} \ {Df ,^ U Dj^) = {0.5,3} \ {0.5,3,4,6} = 0. i.e. 
nothing is sent. 

At this point the algorithm has terminated, both sensors are waiting for an event to 
occur and there are no messages in flight. Pj and Pj denote {0.5, 3, 4, 6, 10, 11, . . . ,a} 
and {0.5, 3, 4, 5, 6, 7, 8, 9, a + 1, a 2, . . . , a + fe}, respectively. Therefore On{Pi) ~ 
{0.5} = On{Pj), which in turn, equals the correct answer On{D). Observe that the 
total amount of communication (data points sent) was 4. The naive approach which 
centralized all the data on either pi or pj requires min{a — 6, b + 5} communication. 
For large min{a,6}, the distributed algorithm requires much less communication. 



5.2 The Algorithm 

To translate the previous example into a formal algorithm for general R (satisfying the 
anti-monotonicity and smoothness axioms), we must provide some definitions gener- 
alizing the role of A''(., .). Given a; £ D and finite P C D, Qi C P is called a support 
set o/x G D over P if R{x,P) = R{x,Qi). Intuitively, all other points from P can be 
discarded without affecting the rank of x. Using cardnality and the tie-breaking mech- 
anism discussed earlier (-< a total linear order on DV we can define a unique, smallest 
support set of x with respect to P, denoted [P |a;]13 Given Q C P, let [P\Q] denote 
U^gQ[P|a:]. In the previous example, R was the distance to the nearest neighbor, so, 
[P\x\ equaled N(x,P) using -< to break ties. 

With these more general definitions, we define a set Zj C Pj to be sufficient for pj 

if 



^ Formally, given Qi, Q2 C P support sets of x with respect to P, Qi is smaller than Q2 if 
IQil < IQ2I or (IQil = IQ2I and Q\ is lexicographically smaller than Q2 with respect to -<). 
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Fig. 2 Global Outlier Detection 
Algorithm 1 

- Set M = 0, and, update Pi accounting for all neighbors pj from which points were 
recieved. For each point x recieved from pj, do the following. If x is not already Pi, then 
add X to D' ■. 

- For each j G Pi, do 

- - Set Zj = On(Pi) U [P^\On{Pi)]. 

- - Repeat until no change: Zj = Zj U lPi\0„{Dij U D^- - U Zj)]. 

- - If Zj \ {D\ j U J is non-empty, then 

Append (j, Zj \(Dl. U Dj_-)) to Af. 

Add points in \ {Dl ■ U Dj J to D\ j. 

- - End If. 

- End For. 

- If M is non-empty, broadcast it to all sensors in Pi. 

end 



{On{Pi) U [P,\On{P^)]) U ([P,|0„(Alj U Oj, , U Z,)]) C Zj. (2) 

Due to the broadcast nature of wireless sensor network communication, pi cannot send 
points to an single immediate neighbor without the other neighbors receiving them 
as well. In light of this, the algorithm accumulates all points (tagged with receipiant 
IDs) to be sent to all immediate neighbors in a single packet, M. When an immediate 
neighbor, pj, receives M, the neighbor extracts those points that are tagged with ID 
j. If no points are tagged as such, pj does not regard receipt of M as an event. 

Pi detects an event if one of the following occurs: (i) the algorithm is initialized, (ii) 
Di changes, (iii) M is received and contains points tagged with i (i.e. points are received 
from a neighbor), or (iv) a link goes up/down causing p[s immediate neighborhood to 
change however, algorithm correctness requires that we assume the network remain 
connected). In response, carries out the following algorithm whose pseudo-code is 
given in Global Outliers Detection Algorithm figure. First, Pi is updated accounting for 
all Pj from which points were recived in M. Only points not already in Pi are added 
to D^j j. The first two steps in the main for- loop ("For each j £ Pi, do") compute a 
Zj satisfying ((2)|, although the result is not guaranteed to be the smallest set to do so. 
The "If.... then" in the main for-loop tests whether there are any points found sufficient 
for Pj that Pi cannot already be sure pj has, i. e. points in Zj but not in Dj ^VJ D\ j. If 
any such points are found they are added to M along with their recipiant ID j. 

5.3 Streaming Data and Peer Addition/Deletion 

In our experiments we assume a sliding window model (based on time) in processing 
the data stream arriving at each sensor. To do so, we assume each point is time- 
stamped when sampled by the sensor. Under the assumption that the sensor clocks 
are synchronized sufficiently well, sensor pi deletes all points in Pi (regardless of where 
they were originally sampled) once thier time-stamp indicates they are no longer in 
the window. 

The algorithm can be easily modified to accomidate the addition of sensors during 
operation. All that is required to do so is treat the arrival of a new sensor as an event for 
the new sensor and for all its immediate neighbors. The algorithm can also be modified 
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to accomidate the removal of sensors {e.g. when their battery is depleted) assuming 
that the network remains connected. In the sliding window model, a simple strategy is 
to merely allow points that originated with the removed sensor to age out of the window 
at the expense of tolerating, strictly speaking, an inaccurate result until this happens. 
A more general and complex solution is to propogate messages into the network causing 
sensors to explicitly delete those points that originated with the removed sensor. We 
leave the details of this approach to future work. 

5.4 Algorithm Correctness 

The correctness of the algorithm can be proved in the following sense: if the data and 
network links remain static (and the network is connected), then communication will 
eventually stop at which point all sensors' outlier estimate will equal On(D). It is 
important to emphasize that this does not mean the algorithm cannot handle dynamic 
data or network links. Merely that, upon such a change, the algorithm will respond and 
converge on the correct answer. But, naturally, such convergence is gauranteed only if 
the data and network remain static long enough. 

It is easy to see that, barring data or network change, the algorithm will always 
terminate. So, the proof proceeds in two steps. First, upon termination, all sensors 
have the same outlier estiamtes and support (Theorem [T|. Second, the consistent out- 
lier estimates shared by all sensors is indeed the correct one (Theorem [5}. Proofs of 
Theorems [1] and [2] are provided in Appendix |9l 

Theorem 1 Assuming a connected network, if for all sensors p^: Di and Fi do not 
change, then upon termination of the algorithm all sensors' outlier estimates and sup- 
ports agree: for all pi,pj: On{Pi) = On{Pj) and [Pi\On{Pi)] = [Pj\On{Pj)]. 

Theorem 2 Assuming a connected network, if for all sensors p,;; Dj and do not 
change, then upon termination of the algorithm, all sensors' outlier estimate will be 
correct: for allpi: On{Pi) ~ On{D). 

Comments: 1) Theorem [1] holds without the smoothness axiom, hence, for any anti- 
monotonic R, upon convergence, all sensors will agree on their outlier estimate and 
support. However, without the smoothness axiom. Theorem [2] does not hold, i.e. the 
consistent outlier estimates might not be the correct one. There are counter-examples 
which show how an anti-monotonic, but not smooth R cause the algorithm to terminate 
without all sensors agreeing upon the correct set of outliers. 

2) For an arbitrary R, it is not clear how to efficiently compute [P\x\ and we do 
not address the issue. However, efficient computation is straight-forward for the R we 
consider in our experiments: average distance to the k^^ nearest neighbor. 

6 Semi-Global Distributed Outlier Detection Algorithm 

It can be desirable for sensors to find outliers only with respect to the data contained 
in nearby sensors, rather than the entire network. In this section, we describe how to 
modify the global outlier detection algorithm to act in a semi-global manner. Under 
this approach, each sensor computes outliers only from within those points sampled in 
its spatial proximity. 
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To account for spatial locality, we use hop distance: the number of hops bewteen 
two sensors along their shortest path in the underlying communication network. Given 
integer d and sensor pi, let D^'^ denote the union of all Dj such that pj and Pi have hop 
distance no greater than d. The semi-global outlier detection problem requires each Pi 
to compute On{D^'^). Setting d to infinity yields the global outlier detection problem 
discussed earlier. 

To account for hop distance, each data point x has an additional field x.hop (at 
birth x.hop is set to zero). Let x.rest denote all the remaining fields - these are the 
ones used by the rating function R. Given a set of points Q, for < /i < d, let Q-^ 
be the set of points x £ Q with x.hop < h. Let [Q]™™ be the result of replacing all 
points that differ only in their hop field by the point with the smallest hop field. For 
example, consider Q — {w,v,x,y,z} where w.rest — v. rest, x.rest = y.rest = z.rest, 
and v.rest 7^ x.rest. If w.hop < v. hop and x.hop < y.hop, z.hop, then [Q]""" = {w, x}. 



6.1 Semi-Global Outlier Detection Algorithm 

The basic idea is that each sensor Pi will run the global outlier detection algorithm over 
only those points arising on sensors within d hops. At first glance, the following simple 
modification of the global algorithm seems adequate. Before pi sends a copy of a point, 
X, to its neighbors, it first increments x.hop and sends only if x.hop < d. Unfortunately, 
such a simple modification will not work. It does not take into account the fact that 
x should not have any effect on the outlier determination process of sensors pj whose 
distance from pi is more than d — x.hop. Examples can be demonstrated wherein this 
omission causes an incorrect overall result. 

To avoid this problem, pi must partition Pj into d parts: for < h < d ~ 1. 

For each, in essence, the global outlier detection algorithm is applied. Upon detecting 
an event (defined as before), Pi carries out the following algorithm whose pseudo-code 
is given in the Semt-Global Outlier Detection Algorithm figure. 

First, Pi is updated accounting for all Pj from which points were recived in M. 
Because of the hop fields, the update step is somewhat more complicated than that of 
the Global Outlier Detection Algorithm. A point x from pj is added to Dj ^ if there 
does not exist y £ Pi with x.rest — y.rest {x does not already appear in Pi). Or, if 
there does exist y £ Pi with x.rest = y.rest, but x.hop < y.hop, then x replaces y in 
Pi (updating as needed Di and ^ for each / £ Fi). Note, there cannot be more than 
one y with the same rest fields as x since all but the point with the smallest hop would 
have been removed earlier. 

Next, for each neighbor pj and each 0</i<d— l,a set z!^ is computed which 

satisfies ^ with Zj, Pi, Dj j, and Dj ^ replaced by Zj", J^-'', D^'f^, and Dj'p, 
respectively. This computation is done by the first steps inside the nested for loops. 
Then, the hop field for each point in Z^ is incremented in preparation for sending to 
Pj. 

Once all < /i < d — 1 have been processed for pj (the inner for loop completes), 
Zj--- Zj~^ are unioned and redundancies are eliminated. For any pair of points x,y 

in Uh=o ^i'' x.rest — y.rest and x.hop < y.hop, then y is dropped (this action is 
signified by the 'min' superscript in the step immediately after the inner for loop). 
Then, all points x are removed from Zj if there exists a point y in {D} j U Dj j) with 
the same rest fields but y.hop < x.hop. If the resulting Zj is non-empty, then these 
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Fig. 3 Semi-Global Outlier Detection 
Algorithm 2 

- Set M = 0, and, update Pi accounting for all neighbors pj from which points were 
recieved. For each point x recieved from pj , do the following. If there does not exist y in 
Pi with x.rest = y.rest, then add x to Pi (and update D? ^■), otherwise if x.hop < y.hop, 
then replace y with x in Pj (updating as needed Dj and D^, ^ for each f € Pi). 

- For each j £ Pi, do 

- - For /i = to d - 1 

Set = OniP,-'') U [P,-''\0„{P,-'')]. 

Repeat until no change: Zj' = U [J^-''|0„(Djf'' U DVp U Zj')]. 

Increment the hop field for each point in Zj*. 

- - End For. 

UtJ^i] ■ 

- Remove points x from Zj such that there exists y G (D? j U ■) with x.rest = y.rest 
and y.hop < x.hop. 

If Zj is non-empty, then 

Append {j, Zj) to M. 

Update ^- by adding the points in Zj . 

- - End If. 

- End For. 

- If M is non-empty, broadcast it to all sensors in Pi. 

end 



points are be added to M (along with ID j) for broadcast to neighbors. And, Dlj is 
updated by adding the points in Zj . 



7 Performance evaluation 

7.1 Experimentation setup 

We used the SENSE wireless sensor network simulator [18] to evaluate the performance 
of the global and semi-global outlier detection algorithms. Specifically, we analyzed the 
following metrics: (1) the accuracy of the algorithms in detecting outliers; (2) the av- 
erage amounts of total energy, transmission energy, and receive energy consumed per 
node per sampling period; and (3) the minimum and maximum amount of energy con- 
sumed in the network. We observed both the global and semi-global outlier detection 
algorithms to be highly accurate as nodes converged upon the correct results approx- 
imately 99% of the time. We attribute any detection error to dropped packets. Since 
average detection accuracy was consistent across all simulation parameters, we did not 
include any accuracy-related plots in this manuscript. 

Various scenarios were used to analyze the performance of our algorithms. First, 
we compared our algorithms' energy usage against that of a purely centralized outlier 
detection algorithm. Here, all nodes periodically sent their sliding window contents to 
a central node which detected outliers based on the unioned data sets and returned the 
outliers back to the nodes. For simplicity, we configured the centralized algorithm to 
calculate only global outliers since for this algorithm, energy usage is independent of 
whether global or semi-global outliers are detected. iAlso, all algorithms (including the 
centralized solution) were evaluated using the following two outlier ranking functions 
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(R): distance to nearest neighbor (NN) and average distance to k nearest neighbors 
(KNN). 

We chose to use a centralized algorithm for our comparison because, to the best of 
our knowledge, there exist no comparable distributed solutions for WSN outlier detec- 
tion. We find such a comparison to still be valid as many WSN deployments continue to 
employ centralized configurations, citing ease of administration as well as maintenance 
of a single (and '"standard"') point of interface with the growing number of applica- 
tions and systems in the sense-and-respond computing domain. Such reasons, however, 
do not preclude the utility of a distributed algorithm such as ours, since it remains very 
useful as a general data processing solution for a wide range of applications, whether 
they are centralized are not. 

For our data sets, we used real- world recorded sensor data streams from [29], in 
which distributed data samples arc both spatially and temporally correlated. The data 
set we used was composed of series of data samples describing environmental phenom- 
ena such as heat, light, and temperature from 53 sensors which periodically transmitted 
individual data samples to a central base station. The data set did contain missing data 
points, which to the best of our knowledge was largely due to packet loss. Hence, we 
replaced missing data points with the average values of the data points within slid- 
ing windows preceding the missing points. This helped retain the temporal trends of 
the data streams. The data points we used contained the following features: (1) ID of 
the sensor that produced the data point; (2) epoch (sequential number denoting the 
data point's position in the sensor's entire stream); (3) data value (we specifically used 
temperature); and (4) x,y location coordinates. We used the data points' temperature 
value and location coordinates as inputs into the outlier rating functions. The location 
coordinates can represent either the place of measurements or an estimate of a posi- 
tion of a target or some other spatial information. It is important to note that these 
coordinates are a part of the data on which our algorithm works in the example. They 
might suffer errors, and become anomalous, just as would any other attribute of the 
data, due to an inaccurate initialization, power degradation, or a transmission error. 
The algorithm itself, however, would work the same regardless if such coordinates are 
given or not. 

We originally simulated two networks based on the coordinates of the sensors in the 

data set: the first of size 32 nodes (which included a uniformly random sampling of the 
full network) and the second of size 53. The purpose of simulating two networks was to 
examine how well the algorithm scaled with the size of the network. We found that the 
as the network size increased, the performance benefit of the distributed algorithms 
increased in comparison to the centralized algorithms and that performance trends for 
different test variables were generally the same. Hence, we did not include any detailed 
results associated with the smaller network in this manuscript. 

We simulated a terrain of size 50m x 50m. Most hardware specifications claim that 
a sensor node's transmission range typically reaches up to approximately 250m, when 
properly elevated. However, when placed on the ground the reported ranges are much 
smaller [62] and for reliable communication indoor using Crossbow motes, the effective 
range drops to a few meters [55]. Therefore, we configured all nodes to have a uniform 
transmission range of approximately 6.77m. We also used the hardware energy model 
based on the Crossbow mote specifications [21] with a transmit /receive/idle power 
sotting of 0.0159W/0.021W/3o-6W (assuming a 3V power source). We simulated the 
wireless transport medium using the free-space signal propagation model. 
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Two protocols were used for routing. For the distributed algorithms, we used simple 
broadcast (as opposed to unicast) transmission with promiscuous listening that allowed 
all nodes to send data points to all their adjacent neighbors using one transmission. 
For the centralized algorithm, we used the well accepted AODV [43] wireless routing 
protocol for multi-hop communication. We note that a simple end-to-end acknowledg- 
ment mechanism was also used to reinforce reliable communication. While alternative 
protocols do exist for more data-centric and energy-efficient communication, our main 
goal was to compare the overhead between the algorithms in a straight-forward manner 
without having to involve ourselves with balancing various advantages towards either 
algorithm. 

All simulations were run for 1000 seconds of simulated time and were repeated 
four times using different random number generator seed values to obtain averaged 
results. As shown in the following plots, we collected results for different values of 
the following algorithm parameters: (1) the length of the node's sliding window, w; 
and (2) the number of outliers to be reported, n. Additionally, for the distributed 
localized outlier detection algorithm, we varied the hop diameter for the localized 
outlier detection algorithm for from one to three hops. The labeling of the data in the 
plots is as follows: (1) Centralized for results obtained with the centralized algorithm; 

(2) Global— NN and Global — KN N for results obtained using distributed global outlier 
detection with NN and KNN outlier detection ranking functions {R), respectively; and 

(3) Semi — global, epsilon — x for all results obtained using distributed localized outlier 
detection where x is the value of the hop diameter of the spatial extent outlier detection. 
For brevity, we will often refer to the different algorithms by these labels. 

7.2 Experimentation results 
7.2.1 Effect of sliding window size 

The plots in Figure U compare the rate of energy usage of the network between using 
the centralized algorithm and the distributed algorithm for global outlier detection as w 
increases and n and k remain fixed at 4. Here, we show separate plots for transmission 
(TX) and receive (RX) energy for the reader who is interested in the disparity between 
the energy consumption due to different radio operations. We note that data points 
are missing for Global-KNN at 10=40, due to the inability of our computing resources 
to complete simulations for this particular algorithm at the given parameter value. 
However, preliminary results based on similar simulations and statistics are shown 
in [15]. Since the trends between both sets of results are nearly identical for the non- 
missing data points, it is reasonable to extrapolate the values of the missing points 
here. 

Both figures show that as w increases, Global-NN is the only algorithm that reduces 
its energy usage. Figure |4] shows that Global-NN eventually becomes the most energy- 
efficient solution given the domain of w. We attribute Global-NN's reduction in energy 
usage to an increasing amount of incoming data redundancy as the size of the sliding 
window increases. Since Global-NN only uses one supporting point to determine an 
outlier, the probability of finding new outliers or supports with this scheme in each 
new time interval as the sliding window increases is low. 

Regarding the energy consumption of Global-KNN and Centralized, Figure |4] re- 
veals trends of increasing energy consumption as w increases. However, given com- 
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Cenltalized - 
Global-NN - 
Global-KNN - 





Fig. 4 Average transmission and receive energy consumed per node per sample interval vs. 
w {n=4, fc=4) for global outlier detection. 



parable results in [15], we can extrapolate that Global-KNN's amount of energy con- 
sumption is a concave increasing function of w, whereas Centralized's is a convex 
increasing function, which makes the latter comparatively less energy-efficient in that 
it approaches a point of network failure at a higher rate. In comparison to Centralized, 
the energy trend for Global-KNN for this data set indicates that when global outliers 
are defined by multiple supporting points (in this case 4), the increasing size of the 
time interval from which the points are chosen has a less drastic effect on the number of 
messages required for the algorithm to converge. Overall, in cases where a user prefers 
to use more supporting points and a larger sliding window to define outliers, energy 
usage will be higher than using Global-KNN, but it is still more beneficial to use a 
distributed solution. 

Figure [5] shows the minimum, average, and maximum amounts of energy consump- 
tion for a sensor node as w increases. Since we limit our focus primarily to the range of 
a sensor's energy consumption, with the intent of analyzing how energy is balanced un- 
der the different algorithms, we present data in terms of total energy consumption. The 
analysis of TX and RX energy have less value here. Figure [5] further accentuates the 
advantage of using the Global-NN outlier detection solution over Centralized for large 
window sizes. Another observation is that the range of energy consumption for different 
motes running the same detection algorithm is larger for the centralized solution than 
for the distributed solution. Figure [6] clearly expresses this point by illustrating the 
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Fig. 5 Average, minimum, and maximum amount of energy consumed by a node for global 
outlier detection. 



values shown previously in Figure [S] only this time normalizing the values with respect 
to the average energy consumption. For w=10, the most energy consuming node con- 
sumed nearly three times more energy than the average node in a centralized algorithm 
and less than twice the energy of the average node in both distributed algorithms. 

For the partial information for w=40, the normalized range of energy consumption 
is actually lower for the centralized algorithm than for the distributed one. However, 
referring back to Figure O the average energy consumption for a node in the central- 
ized case is much higher than that for the distributed case. Hence, in this case, the 
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Fig. 6 Normalized average, minimum, and maximum amount of energy consumed by a node 
for global outlier detection. 



normalized maximum value does not convey the full picture of energy quality of the 
compared algorithms. 

The plots in Figure [7] compare the rate of energy usage between the centraUzed 
algorithm and the distributed algorithm for localized outlier detection. Since the results 
of using NN and KNN outlier detection methods are nearly identical, only results for 
the former are shown. Again, the centralized algorithm uses much more energy than 
the distributed algorithms. Regarding the distributed localized algorithms, the rate 
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Fig. 7 Average transmission and receive energy consumed per node per sample interval vs. 
w (n=4) for localized outlier detection using nearest neighbor outlier detection. 



of energy usage increases along with the values of epsilon. This is expected since as 
epsilon increases, so does the message passing overhead as data points travel farther 
from their place of origin. The behavior of the distributed algorithm in the localized 
case for nearest neighbor outlier detection is similar to that of global case for the 
same detection method. Energy usage generally decreases as w increases. As before, we 
attribute this behavior to the increasing amount of data redundancy as the size of the 
sliding window increases. In general, the extent of the spatial area over which outliers 
are defined affects the energy usage trends of the algorithm, but not by a significant 
amount. 

7.2.2 Effect of the number of reported outliers 

We now investigate how the number of outliers produced afi'ects energy usage. Figure|9] 
shows the plots illustrating the performance of the localized outlier detection algorithms 
under increasing values for n for KNN outlier detection. Similar plots for NN detection 
are omitted due to space restrictions and similarity of results; NN detection is negligibly 
less energy efficient most likely due to a lower rate of convergence. The energy usage 
trends for these algorithms are straightforward and expected. Energy usage increases 
along with both n and epsilon, which both cause more message psissing overhead with 
increasing value. We also noticed that the rate at which energy usage increased was 
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Fig. 8 Average transmission and receive energy consumed per node per sample interval vs. 
w (n=4, k=4) for localized outlier detection using k nearest neighbor outlier detection. 



related to epsilon. This is expected since the compounded effects of larger epsilon and 
n values should make a more noticeable mark on how energy is used. 



8 Conclusions 

We addressed the problem of unsupervised outlier detection in WSNs. We developed 
a solution that 

1. allows flexibility in the heuristic used to define outliers, 

2. computes the result in-network to reduce both bandwidth and energy usage, 

3. only uses single hop communication thus permitting very simple node failure de- 
tection and message reliability assurance mechanisms (e.g., carrier-sense), and 

4. seamlessly accommodates dynamic updates to data. 

We evaluated the outlier detection algorithm's behavior on real-world sensor data 
using a simulated wireless sensor network. These initial results show promise for our 
algorithm in that it outperforms a strictly centralized approach under some very im- 
portant circumstances. When the unabridged data from the entire sensor network are 
sent to a single location, the node collecting this data as well as its nearest neighbors 
become a bottleneck of the entire system. Indeed, the density of trafHc in this region 
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Fig. 9 Average transmission and receive energy consumed per node per sample interval vs. n 
{w=20, fe=4) for localized outlier detection using k nearest neighbor outlier detection. 



is proportional to the area of coverage of the entire network while the average node 
has the traffic density proportional to the area covered by its communication range. 
In the example that we simulated in the paper, the traffic in the area of the collect- 
ing node was about 50 times more dense than in the other parts of the network. The 
immediate consequence is the shorter lifc-tirnc of the network, as the nodes near the 
collecting point will die because of battery exhaustion when many remaining nodes 
will use just 2% of their energy. The second consequence is the congestion of the traf- 
fic that cither results in a lot of interference necessitating retransmissions or delays 
or, alternatively, in delays imposed by a multi-slot bandwidth sharing scheme needed 
to avoid transmission interference. In short, using the centralized algorithm with its 
drastic imbalance of the traffic density will put even the best routing protocols under 
the sever stress. In contrast, our distributed and localized outlier detection algorithms 
avoid these difficulties. 

Our approach is well suited for applications in which the confidence of an outlier 
rating may be calculated by either an adjustment of sliding window size or the number 
of neighbors used in a distance-based outlier detection technique. We assert that these 
applications are critical for resource-constrained sensor networks for two reasons. First, 
communication is a costly activity motivating the need for only the most accurate data 
to be transmitted to a client application. Second, emerging safety- critical applications 
that utilize wireless sensor networks will require the most accurate data, including 
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outliers. This work represents our contribution toward enabling efficient data cleaning 
solutions for these types of applications. 
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9 Appendix: Correctness Proofs for the Global Outlier Detection 
Algorithm 

In this section, we provide detailed proofs of Theorems [T] and [21 Before doing so, a 
few technical lemmas are needed. The first two isolate a couple of useful properties 
following from the axioms of R. 

Lemma 1 For any P ^ Q ^ D where \P\ > n, if On{P) 7^ On{Q), then there exists 
X £ On{P) such that R{x,P) > R{x,Q). 

Proof Assume On{P) / On(Q). Since \On(P)\ = \On(Q)\ = n, then there exists x G 
{On{P) \ On{Q)) and y G (On(Q) \ On{P))- Recall that we assume a tie-breaking 
machanism is used to ensure R{-P) and R{.,Q) are one-to-one. Thus, by definition of 
On(.) it follows that R{x,P) > R{y,P) and R{y,Q) > R{x,Q). The anti-monotoncity 
axiom implies R(y,P) > R(y,Q) yeilding the desired result. □ 

Lemma 2 For any P <ZD,x£ 0„(P), and z £ P, wehave R{x,P) = R{x, [P|0„(P)]) 
= Rix,[P\On{P)]U{z}). 

Proof Since, by definition, [P\x] C [P|0„(P)] C ([P|0„(P)] U {z}) C P, then by the 
anti-nionotonicity axiom it follows that 

R{x,P) = R(x,[P\x]) 

> R{x,[P\On{P)]) 

> Rix,[P\On{P)]U{z}) 

> R{x,P). 

□ 

The last technical lemma shows that once a sensor completes its local compu- 
tation, then {Dl j U i) contains a particular crucial set of points (among others) 
needed for consistency among sensors' outlier estimates. 

Lemma 3 For anypi, once the main for-loop in the algorithm completes, [Pi\On{Dl jU 
Dli)]C{Dl^UDli). 

Proof Let Dl j (before) denote the set of points held by pi and sent from pi to Pj 
immediately before the execution of the "Repeat unil no change: ..." step in the main 
for loop for j € P^. For ^ > 1, let Zj{£) denote Zj immediately before the iteration 
in the execution of the "Repeat until no change ..." step in the main for loop for j G Pi, 

e.g. Z,il)=On{P^) U [P,\On{P,)]. 

By definition, Zj(£) C Zj{e + 1) C Pj and P^ is finite. Thus, let £* denote the 
smallest integer such that Zj{£* — 1) = Zj{£*). Hence, the "Repeat until no change 
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..." step terminates at the end of iteration t* and Zj — Zj{£*) in the remainder of the 
main for loop. Therefore, 

Zj{t) = Zj{e*) U [Pi\On{Dlj{before) U L>j,, U Zj{t))] 

and 

Dlj U D^^i = Dj^j (before) U D^i U Zj{t). 
It follows that [P,\0„{Dl .j U Dj.,)] C C [Dlj U D}.,). □ 

Now we prove that upon termination of the algorithm, the sensors' estiamtes are con- 
sistent. 

r/ieorerril] Assuming a connected network, if for all sensors Pi: Di and Fi do not 
change, then upon termination of the algorithm all sensors' outlier estimates and sup- 
ports agree: for allp^p^-; (i) On{P^) = On{Pj) and (ii) [P^\On{P^)] = [Pj\On{Pj)]. 

Proof Since the network is connected, we may assume, without loss of generality, that 
Pi and Pj are neighbors. To prove part (i), we will show that On{Pi) = On{Dl j U-D* ,) 
= On (Df . UDj j ) = On (Pj ) . The middle equality foUows from the fact that (D ■ UD* ^ ) 
— {Dj j U Dj j). By symmetry, it suffices to show the first equality. 

Suppose On{Pi) 7^ On{D\ j U j). The following contradiction is reached. There 
exists X £ On{D\ j U Dj j) such that 

R{x,DljUDl,) > R{x,Pi) 

= R{x,[P,\x]) 

> R{x,[P,\On{DljUD]^,)]) 

> R{x,Dlj\jD]^i). 

The first inequality follows from Lemma [T] (with P = (D- ^ U J and Q = Pi). The 
equality follows from the definition of support [. | .] . The last two inequalities follow from 
the anti-monotonicity of R and Lemma |3] 

To prove part (ii) [Pi\On{Pi)] ~ [Pj\On{Pj)], it suffices to show that for any x £ 
OniDljUDj i), it is the case that [Pi\x] = [Pj\x]. This is because On{Pi) = On{D\ j U 
D^i) = On{Pj)- We wiU prove that [Pi\x] = [Pi n P-j\x\ = [Pj\x\. By symmetry it is 
enough to show the first equality. 

From Lemma |3] it follows that 

[P,\x\ C [P,\On{DljyjD]^,)] 
C [Dlj U D)^i) 

c (P.nPj). 

Thus, anti-monotonicity implies R{x,Pi) > R{x,[Pi\x]) > R{x,Pi C] Pj) > R{x,Pi), 
and so. 



R{x,P,)^R{x,[P,\x])^R{x,P,r\Pj). 
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Therefore, [Pi Ik], [Pi n Pj\x] are support sets of x with respect to Pi n Pj and Pi. 
Since [Pi\x] {[Pi C] Pj\x]) is the unique smallest support set for x with respect to Pi 
{Pi n Pj), then it follows that [Pi|a;] = [P, n Pj\x]. □ 

Finally, we prove that upon termination the sensors' estiamtes are equal to the correct 
answer. 

r/ieorerrj2] Assuming a connected network, if for all sensors p^: Dj and P, do not 
change, then upon termination of the algorithm, all sensors' outlier estimate will be 
correct: for all pf. On (Pi) = On{D). 

Proof Suppose there exists a sensor pi such that 0,i(P,;) 7^ On{D)- By Lemma[T](with 
P = Pi and Q — D), there exists x £ On{Pi) such that R(x,Pi) > R{x,D). Moreover, 
the first equality in Lemma[2](with P = Pi) implies that R{x, [Pi\On{Pi)]) ~ R{Pi,x). 

Since P(a;, [Pi |On(Pij)]) > P(a;, _D), then the smoothness axiom (with Qi = [Pi|OTi(Pi)] 
and Q2 ~ D), implies there exists z £ {D \ [Pi\On{Pi)]) 



R{X, [P,\On{P^)]) > R{X, [P,\On{Pi)]) U {z}). 

This point z must be contained in Pj for some sensor pj . Hence, the inequality the 
following contradiction is reached. 



R{x,lP,\On{P^)]) > R{x,lP,\On{P^)]'J{z}) 
= R{x,[P,\On{Pj)]U{z}) 
^ R{x,[Pj\On{Pj)]) 
= R{x,[P,\On{P^)]). 

The inequality above leads to the following contradiction. The first equality follows 
from Theorem [T] part (i) . The middle equality follows from the second equality of 
Lemma [2] (with P — Pj and noting that On(Pj) = On (Pi) by Theorem [1] part (i)). 
The last equality follows from Theorem [1] part (ii). □ 



